AITopics | surrogate label

d862f7f5445255090de13b825b880d59-Paper-Conference.pdf

Neural Information Processing SystemsFeb-17-2026, 09:49:58 GMT

large language model, machine learning, natural language, (17 more...)

Neural Information Processing Systems

Country:

North America > United States > Louisiana > Orleans Parish > New Orleans (0.04)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
Europe > Portugal > Porto > Porto (0.04)
(3 more...)

Genre: Research Report > Experimental Study (1.00)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Regression (0.70)

Add feedback

Using Imperfect Surrogates for Downstream Inference: Design-based Supervised Learning for Social Science Applications of Large Language Models

Neural Information Processing SystemsDec-26-2025, 22:18:20 GMT

In computational social science (CSS), researchers analyze documents to explain social and political phenomena. In most scenarios, CSS researchers first obtain labels for documents and then explain labels using interpretable regression analyses in the second step. One increasingly common way to annotate documents cheaply at scale is through large language models (LLMs). However, like other scalable ways of producing annotations, such surrogate labels are often imperfect and biased. We present a new algorithm for using imperfect annotation surrogates for downstream statistical analyses while guaranteeing statistical properties--like asymptotic unbiasedness and proper uncertainty quantification--which are fundamental to CSS research.

large language model, machine learning, natural language, (7 more...)

Neural Information Processing Systems

Genre: Research Report > New Finding (0.39)

Technology:

Information Technology > Artificial Intelligence > Machine Learning (0.79)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.62)

Add feedback

9b22a40256b079f338827b0ff1f4792b-AuthorFeedback.pdf

Neural Information Processing SystemsNov-14-2025, 17:10:54 GMT

One reason could be the number of models used for the violin plot.

artificial intelligence, metaepoch, surrogate label, (15 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence (0.50)

Add feedback

Using Imperfect Surrogates for Downstream Inference: Design-based Supervised Learning for Social Science Applications of Large Language Models

Neural Information Processing SystemsOct-9-2025, 08:53:11 GMT

In most scenarios, CSS researchers first obtain labels for documents and then explain labels using interpretable regression analyses in the second step. One increasingly common way to annotate documents cheaply at scale is through large language models (LLMs).

large language model, machine learning, natural language, (17 more...)

Neural Information Processing Systems

Country:

North America > United States > Louisiana > Orleans Parish > New Orleans (0.04)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
Europe > Portugal > Porto > Porto (0.04)
(3 more...)

Genre: Research Report > Experimental Study (1.00)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Regression (0.90)

Add feedback

9b22a40256b079f338827b0ff1f4792b-AuthorFeedback.pdf

Neural Information Processing SystemsAug-15-2025, 08:48:38 GMT

One reason could be the number of models used for the violin plot.

artificial intelligence, metaepoch, surrogate label, (15 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence (0.50)

Add feedback

AI-Assisted Decision-Making for Clinical Assessment of Auto-Segmented Contour Quality

Wang, Biling, Maniscalco, Austen, Bai, Ti, Wang, Siqiu, Dohopolski, Michael, Lin, Mu-Han, Shen, Chenyang, Nguyen, Dan, Huang, Junzhou, Jiang, Steve, Wang, Xinlei

arXiv.org Artificial IntelligenceMay-13-2025

Purpose: This study introduces a novel Deep Learning (DL) - based q uality a sses s ment (QA) approach specifically designed for evaluating auto - generated contours (auto - contour s) in auto - segmentation for radiotherapy, with a focus on Online Adaptive Radiotherapy (OART). The proposed method leverages Bayesian Ordinal Classification (BOC), combined with cali brated thresholds derived from uncertainty quantification, to deliver confident QA predictions . This approach address es key challenges in clinical auto - segmentation QA workflows such as the absence of ground truth contours, limited availability of manually labeled data, and inherent uncertainty in AI model predictions . Methods: We developed a BOC model to classify the quality of auto - contour s and quantify uncertainty. To enhance predictive reliability, we implemented a calibration step to determine optimal uncertainty thresholds that meet specific clinical accuracy requirements . The method was validated under three distinct data availability scenarios: absence of manual labels, limited manual labeling, and extensive manual labeling. We specifically tested our method for auto - segmented rectum contours in prostate cancer radiotherapy. Geometric surrogate labels were employed in the absence of manual labels, transfer learning was applied when manual labels were limited, and direct use of manual labels was perf ormed when extensive labeling was available. Results: The BOC model demonstrated robust performance across all data availability scenarios for confident predictions, with significant accuracy gains when pre - trained with surrogate labels and fine - tuned with limited manual ly label ed data . Specifically, fine - tuning the pretrained model with just 30 manually labeled cases and calibrating with 34 subjects achieved over an accuracy of over 90% against manual labels in the test dataset .

artificial intelligence, contour, machine learning, (17 more...)

arXiv.org Artificial Intelligence

2505.00308

Country:

North America > United States > Texas > Dallas County > Dallas (0.04)
North America > United States > Texas > Tarrant County > Arlington (0.04)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)

Genre: Research Report > New Finding (1.00)

Industry: Health & Medicine > Therapeutic Area > Oncology (1.00)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Supervised Pretraining for Material Property Prediction

Rahman, Chowdhury Mohammad Abid, Romero, Aldo H., Gyawali, Prashnna K.

arXiv.org Artificial IntelligenceApr-30-2025

Accurate prediction of material properties facilitates the discovery of novel materials with tailored functionalities. Deep learning models have recently shown superior accuracy and flexibility in capturing structure-property relationships. However, these models often rely on supervised learning, which requires large, well-annotated datasets an expensive and time-consuming process. Self-supervised learning (SSL) offers a promising alternative by pretraining on large, unlabeled datasets to develop foundation models that can be fine-tuned for material property prediction. In this work, we propose supervised pretraining, where available class information serves as surrogate labels to guide learning, even when downstream tasks involve unrelated material properties. We evaluate this strategy on two state-of-the-art SSL models and introduce a novel framework for supervised pretraining. To further enhance representation learning, we propose a graph-based augmentation technique that injects noise to improve robustness without structurally deforming material graphs. The resulting foundation models are fine-tuned for six challenging material property predictions, achieving significant performance gains over baselines, ranging from 2% to 6.67% improvement in mean absolute error (MAE) and establishing a new benchmark in material property prediction. This study represents the first exploration of supervised pertaining with surrogate labels in material property prediction, advancing methodology and application in the field.

artificial intelligence, machine learning, property prediction, (17 more...)

arXiv.org Artificial Intelligence

2504.20112

Country:

North America > United States > West Virginia (0.04)
North America > United States > California > San Diego County > San Diego (0.04)

Genre: Research Report > New Finding (1.00)

Industry: Energy (0.46)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Using Imperfect Surrogates for Downstream Inference: Design-based Supervised Learning for Social Science Applications of Large Language Models

Neural Information Processing SystemsJan-19-2025, 23:40:19 GMT

In computational social science (CSS), researchers analyze documents to explain social and political phenomena. In most scenarios, CSS researchers first obtain labels for documents and then explain labels using interpretable regression analyses in the second step. One increasingly common way to annotate documents cheaply at scale is through large language models (LLMs). However, like other scalable ways of producing annotations, such surrogate labels are often imperfect and biased. We present a new algorithm for using imperfect annotation surrogates for downstream statistical analyses while guaranteeing statistical properties--like asymptotic unbiasedness and proper uncertainty quantification--which are fundamental to CSS research.

large language model, machine learning, natural language, (7 more...)

Neural Information Processing Systems

Genre: Research Report > New Finding (0.41)

Technology:

Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.97)

Add feedback

Dynamic Classification of Latent Disease Progression with Auxiliary Surrogate Labels

Cai, Zexi, Zeng, Donglin, Marder, Karen S., Honig, Lawrence S., Wang, Yuanjia

arXiv.org Machine LearningDec-10-2024

Disease progression prediction based on patients' evolving health information is challenging when true disease states are unknown due to diagnostic capabilities or high costs. For example, the absence of gold-standard neurological diagnoses hinders distinguishing Alzheimer's disease (AD) from related conditions such as AD-related dementias (ADRDs), including Lewy body dementia (LBD). Combining temporally dependent surrogate labels and health markers may improve disease prediction. However, existing literature models informative surrogate labels and observed variables that reflect the underlying states using purely generative approaches, limiting the ability to predict future states. We propose integrating the conventional hidden Markov model as a generative model with a time-varying discriminative classification model to simultaneously handle potentially misspecified surrogate labels and incorporate important markers of disease progression. We develop an adaptive forward-backward algorithm with subjective labels for estimation, and utilize the modified posterior and Viterbi algorithms to predict the progression of future states or new patients based on objective markers only. Importantly, the adaptation eliminates the need to model the marginal distribution of longitudinal markers, a requirement in traditional algorithms. Asymptotic properties are established, and significant improvement with finite samples is demonstrated via simulation studies. Analysis of the neuropathological dataset of the National Alzheimer's Coordinating Center (NACC) shows much improved accuracy in distinguishing LBD from AD.

artificial intelligence, diagnosis, machine learning, (19 more...)

arXiv.org Machine Learning

2412.08088

Country:

North America > United States > New York (0.04)
North America > United States > Michigan > Washtenaw County > Ann Arbor (0.04)

Genre: Research Report > New Finding (0.46)

Industry: Health & Medicine > Therapeutic Area > Neurology > Alzheimer's Disease (1.00)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.93)

Add feedback

Using Imperfect Surrogates for Downstream Inference: Design-based Supervised Learning for Social Science Applications of Large Language Models

Egami, Naoki, Hinck, Musashi, Stewart, Brandon M., Wei, Hanying

arXiv.org Machine LearningOct-30-2023

In computational social science (CSS), researchers analyze documents to explain social and political phenomena. In most scenarios, CSS researchers first obtain labels for documents and then explain labels using interpretable regression analyses in the second step. One increasingly common way to annotate documents cheaply at scale is through large language models (LLMs). However, like other scalable ways of producing annotations, such surrogate labels are often imperfect and biased. We present a new algorithm for using imperfect annotation surrogates for downstream statistical analyses while guaranteeing statistical properties -- like asymptotic unbiasedness and proper uncertainty quantification -- which are fundamental to CSS research. We show that direct use of surrogate labels in downstream statistical analyses leads to substantial bias and invalid confidence intervals, even with high surrogate accuracy of 80--90\%. To address this, we build on debiased machine learning to propose the design-based supervised learning (DSL) estimator. DSL employs a doubly-robust procedure to combine surrogate labels with a smaller number of high-quality, gold-standard labels. Our approach guarantees valid inference for downstream statistical analyses, even when surrogates are arbitrarily biased and without requiring stringent assumptions, by controlling the probability of sampling documents for gold-standard labeling. Both our theoretical analysis and experimental results show that DSL provides valid statistical inference while achieving root mean squared errors comparable to existing alternatives that focus only on prediction without inferential guarantees.

large language model, machine learning, natural language, (17 more...)

arXiv.org Machine Learning

2306.04746

Country:

North America > United States > Louisiana > Orleans Parish > New Orleans (0.04)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
Europe > Portugal > Porto > Porto (0.04)
(2 more...)

Genre:

Research Report > Experimental Study (1.00)
Research Report > New Finding (0.88)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Regression (0.90)

Add feedback

Collaborating Authors

surrogate label

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

d862f7f5445255090de13b825b880d59-Paper-Conference.pdf

Using Imperfect Surrogates for Downstream Inference: Design-based Supervised Learning for Social Science Applications of Large Language Models

9b22a40256b079f338827b0ff1f4792b-AuthorFeedback.pdf

Using Imperfect Surrogates for Downstream Inference: Design-based Supervised Learning for Social Science Applications of Large Language Models

9b22a40256b079f338827b0ff1f4792b-AuthorFeedback.pdf

AI-Assisted Decision-Making for Clinical Assessment of Auto-Segmented Contour Quality

Supervised Pretraining for Material Property Prediction

Using Imperfect Surrogates for Downstream Inference: Design-based Supervised Learning for Social Science Applications of Large Language Models

Dynamic Classification of Latent Disease Progression with Auxiliary Surrogate Labels

Using Imperfect Surrogates for Downstream Inference: Design-based Supervised Learning for Social Science Applications of Large Language Models